Practical Gauss-Newton Optimisation for Deep Learning

ثبت نشده
چکیده

The curvature matrix depends on the specific optimisation method and will often be only an estimate. For notational simplicity, the dependence of f̂ on θ is omitted. Setting C to the true Hessian matrix of f would make f̂ the exact secondorder Taylor expansion of the function around θ. However, when f is a nonlinear function, the Hessian can be indefinite, which leads to an ill-conditioned quadratic approximation f̂ . For this reason, C is usually chosen to be positive-semi definite by construction, such as the Gauss-Newton or the Fisher matrix. In the experiments discussed in the paper, C can be either the full Gauss-Newton matrix Ḡ, obtained from running Conjugate Gradient as in (Martens, 2010), or a block diagonal approximation to it, denoted by G̃. The analysis below is independent of whether this approximation is based on KFLR, KFRA, KFAC or if it is the exact block-diagonal part of Ḡ, hence there will be no reference to a specific approximation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Practical Gauss-Newton Optimisation for Deep Learning

We present an efficient block-diagonal approximation to the Gauss-Newton matrix for feedforward neural networks. Our resulting algorithm is competitive against state-of-the-art first-order optimisation methods, with sometimes significant improvement in optimisation performance. Unlike first-order methods, for which hyperparameter tuning of the optimisation parameters is often a laborious proces...

متن کامل

Diffeomorphic registration using geodesic shooting and Gauss–Newton optimisation

This paper presents a nonlinear image registration algorithm based on the setting of Large Deformation Diffeomorphic Metric Mapping (LDDMM), but with a more efficient optimisation scheme--both in terms of memory required and the number of iterations required to reach convergence. Rather than perform a variational optimisation on a series of velocity fields, the algorithm is formulated to use a ...

متن کامل

Revisiting Horn and Schunck: Interpretation as Gauss-Newton Optimisation

In this paper we revisit the Horn and Schunck optical flow method [1], and focus on its interpretation as Gauss-Newton optimisation. We explicitly demonstrate that the standard incremental version of the Horn and Schunck (HS) method1 is equivalent to Gauss-Newton (GN) optimisation of the non-linearised energy, consisting of the sum of squared differences (SSD) criterion and diffusion regularisa...

متن کامل

Block-diagonal Hessian-free Optimization

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are rarely applied to deep learning in practice because of high computational cost and the need for model-dependent algorithmic variations. We introduce a variant of ...

متن کامل

Distributed Newton Methods for Deep Neural Networks

Deep learning involves a difficult non-convex optimization problem with a large number of weights between any two adjacent layers of a deep structure. To handle large data sets or complicated networks, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and the synchronization cost may become a bottleneck. In this...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017